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The expression of the genomic information of severe acute respiratory syndrome coronavirus (SARS CoV) 
involves synthesis of a nested set of subgenomic RNAs (sgRNAs) by discontinuous transcription. In SARS 
CoV-infected cells, 10 sgRNAs, including 2 novel ones, were identified, which were predicted to be functional 
in the expression of 12 open reading frames located in the 3' one-third of the genome. Surprisingly, one new 
sgRNA could lead to production of a truncated spike protein. Sequence analysis of the leader-body fusion sites 
of each sgRNA showed that the junction sequences and the corresponding transcription-regulatory sequence 
(TRS) are unique for each species of sgRNA and are consistent after virus passages. For the two novel sgRNAs, 
each used a variant of the TRS that has one nucleotide mismatch in the conserved hexanucleotide core 
(ACGAAC) in the TRS. Coexistence of both plus and minus strands of SARS CoV sgRNAs and evidence for 
derivation of the sgRNA core sequence from the body core sequence favor the model of discontinuous tran¬ 
scription during minus-strand synthesis. Moreover, one rare species of sgRNA has the junction sequence AAA, 
indicating that its transcription could result from a noncanonical transcription signal. Taken together, these 
results provide more insight into the molecular mechanisms of genome expression and subgenomic transcrip¬ 
tion of SARS CoV. 


Severe acute respiratory syndrome (SARS) is an atypical 
form of pneumonia that was first recognized in Guangdong 
Province, China, in November 2002, and its causative agent 
was identified as novel a coronavirus (SARS CoV) (7, 9, 14). 
Coronaviruses are the largest RNA viruses, containing a sin¬ 
gle-stranded, plus-sense RNA ranging from 27 to 31.5 kb in 
size. The genomes of coronaviruses, possessing a 5' cap struc¬ 
ture and 3' poly(A) tail, are polycistronic and are expressed 
through a poorly understood regulatory mechanism (11). The 
two large open reading frames (ORFs) (la and lb) at the 5' 
end of the genome encode the viral replicase and are trans¬ 
lated directly from the genomic RNA, while ORF lb is ex¬ 
pressed by —1 ribosomal frameshifting (26). The 3' one-third 
of the genome comprises the genes encoding the structural and 
auxiliary proteins translated through six to nine nested and 
3'-coterminal subgenomic RNAs (sgRNAs), but the number, 
composition, and expression strategies of the 3'-proximal 
ORFs vary greatly among coronaviruses, although four genes 
for the structural proteins S, E, M, and N are always included 
( 11 ). 

A unique feature for coronaviruses and some related viruses 
in the order Nidovirales is that the viral sgRNAs contain a 
common leader sequence of 55 to 92 nucleotides (nt), which is 
derived from the 5' end of the genomic RNA (11). It has been 
shown that the synthesis of each subgenomic mRNA involves a 
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discontinuous step by which the so-called 3' body sequence is 
fused to the genomic 5' leader sequence (22). The fusion of 
leader and body sequences during discontinuous transcription 
is determined, at least in part, by cw-acting elements termed 
transcription-regulatory sequences (TRS). These elements are 
located both at the 5' end of the genome and at 5'-proximal 
sites corresponding to the individual transcription units (5). 
Although the mechanism for synthesis of sgRNAs is not fully 
understood, several models have been proposed. Two major 
models are leader-primed transcription (10, 12) and discontin¬ 
uous transcription during minus-strand synthesis (19, 21), and 
the latter model has gained more support from recent evidence 
for the existence of transcriptionally active, subgenome-sized 
minus strands containing the antileader sequence and a tran¬ 
scription intermediate active in the synthesis of mRNAs (21, 
22, 23, 24). 

The genomes of many SARS CoV isolates have been se¬ 
quenced, and they consist of approximately 29,700 nucleotides 
(13, 15, 16). Fourteen ORFs have been identified, of which 12 
are located in the 3'-proximal one-third of the genome (13, 
25). The exact mechanisms of expression of the 3'-proximal 
ORFs are unknown, but by analogy with other coronaviruses, 
these ORFs are expressed through a set of sgRNAs (15). Rota 
and colleagues could readily identify six sgRNAs, and later 
Thiel et al. demonstrated the existence of eight sgRNAs in 
SARS CoV-infected cells (15, 26). However, the exact number 
and molecular mechanism underlying the synthesis of SARS 
CoV sgRNAs have not been clarified yet. Therefore, identifi¬ 
cation of new sgRNAs and characterization of the molecular 
details of the leader-body fusion in the sgRNAs will help elu- 


5288 


Downloaded from http://jvi.asm.org/ on March 10, 2015 by Thomas Jefferson Univ 




Vol. 79, 2005 


LEADER-mRNA FUSION SITES OF SARS CoV 5289 


TABLE 1. Oligonucleotides used for RT-PCR analysis of SARS CoV subgenomic RNAs 


Primer 

Sequence (5' —> 3') 

Position" (region) 

Oligo(dT) 

TTTTTTTTTTTTTTT 

Poly(A) tail 

SF8 

CCAGGAAAAGCCAACCAACC 

20-39 (leader) 

SF9 

CTCGATCTCTTGTAGATCTG 

39-58 (leader) 

SR10 

CTTTCGGTCACACCCGGAC 

259-241 (5' UTR 6 ) 

SR11 

TCTGAAACATCAAGCGAAAAGG 

22012-21991 (S) 

SR12 

TGTGCTTACAAGGGCACGCTAG 

26097-26076 (3a and 3b) 

SR13 

AATGTTTGTTTCTGGGTTGAATG 

26748-26726 (M) 

SR14 

CGCAGCTGATAGGTATGTCG 

27505-27486 (7a) 

SR15 

ACAAGTACGTCTCTAAATGCAGCA 

28091-28068 (8b) 

SRI 6 

GGTGTTGATTGGAACGCCCTG 

28348-28328 (N) 

SF17 

TGTAAACGTTTTCGCAATTCCG 

29421-29442 (3' UTR) 

SR18 

TTTGTCATTCTCCTAAGAAGCTAT 

29705-29725 (3' UTR) 


a Numbering refers to the nucleotide coordinates of the SARS virus isolate WHU sequence (accession no. AY394850). 
b UTR, untranslated region. 


cidate the regulatory mechanism of SARS CoV transcription 
and replication, and this knowledge could further be used for 
development of antiviral therapeutic agents and a vaccine for 
the cure and prevention of this newly emerged disease. 

In this study, we showed the coexistence of both plus- and 
minus-strand sgRNAs in SARS CoV-infected cells and iden¬ 
tified 10 sgRNAs, including two novel subgenomic mRNAs 
(named 2-1 and 3-1) with noncanonical leader-body fusion 
sites. 

MATERIALS AND METHODS 

Virus and cells. African green monkey kidney (Vero E6) cells and baby 
hamster kidney (BHK) cells were grown and maintained in Dulbecco’s modified 
Eagle medium and modified Eagle medium (Gibco Invitrogen Corp.), respec¬ 
tively, supplemented with 10% heat-inactivated fetal bovine serum (Gibco In¬ 
vitrogen Corp.) and 100 U of penicillin and 100 |xg of streptomycin (Gibco 
Invitrogen Corp.) per ml. Vero cells (4 X 10 7 ) were infected at a multiplicity of 
infection of 0.1 with SARS coronavirus strain WHU, which was isolated from a 
blood sample from a patient admitted to a local hospital with characteristic signs 
and symptoms of SARS (29). The complete genome sequence of SARS virus 
isolate WHU was determined in the previous study (29) (GenBank accession 
number AY394850). 

Northern blotting. The total cellular RNA from SARS CoV-infected Vero E6 
cells was extracted by using Trizol reagent (Invitrogen) according to the manu¬ 
facturer’s instructions. Twenty micrograms of extracted RNA was fractionated in 
a 1.2% denaturing agarose gel containing 2.2 M formaldehyde with IX MOPS 
(morpholinepropanesulfonic acid) buffer (17), transferred to a nylon membrane 
(Hybond-A; Amersham Pharmacia), and UV cross-linked. The Northern blot 
was probed overnight at 42°C with 32 P-labeled strand-specific single-stranded 
DNA probes according to the protocol of the manufacturer (Amersham Phar¬ 
macia). The signals were detected and analyzed with a Phosphorlmager and 
Image Quant software (Molecular Dynamics). The membrane was stripped of 
the first probe according to the protocols provided with the Hybond A mem¬ 
brane and was reprobed with the second probe. 

The negative probe was complementary to the 3' ends (positions 29421 to 
29725) of SARS CoV mRNAs and was used to detect plus-sense sgRNAs. The 
positive probe, complementary to the 5' ends of viral antisense RNAs, was used 
to detect minus-strand subgenomic RNAs. 

Reverse transcriptase PCR (RT-PCR) of SARS CoV minus-strand RNA and 
subgenomic mRNAs. One microgram of total cellular RNA, extracted from 
SARS CoV-infected Vero cells, was reverse transcribed into single-stranded 
cDNA with Moloney murine leukemia virus reverse transcriptase (Promega). 
01igo(dT) 15 or the strand-specific oligonucleotide SR18 (Table 1) was used to 
prime cDNA synthesis from plus-sense RNAs, while oligonucleotide SF8 (Table 
1), which is complementary to the antileader sequence, was used for cDNA 
synthesis from minus-sense viral RNAs under the conditions recommended by 
the manufacturer (Promega). A 0.2-pl amount of cDNA product from the RT 
step was used for PCR. Primers for PCR (Table 1) were originally designed on 
the basis of the published SARS virus genome sequences of strains BJ01 (acces¬ 


sion number AY278488), HKU (accession number AY278491), and Urbani 
(accession number AY278741) by using OLIGO 4.1 (National Biosciences). 

Cloning and sequencing of the leader-body junction sequences of SARS CoV 
subgenomic mRNAs and minus-strand subgenomic RNAs. RT-PCR products 
were excised from the agarose gel and purified with a QIAquick gel extraction kit 
(Qiagen) as recommended by the manufacturer. Purified PCR fragment were 
ligated into the pGEM-T/T-Easy PCR cloning vector (Promega) or the 
pMD18-T PCR cloning vector (TakaRa) and transformed into Escherichia coli 
DH5a competent cells. Screening was done by colony PCR and restriction 
endonuclease digestion, and multiple independent cDNA clones were selected 
and sequenced for each species of subgenomic RNA. The sequencing reaction 
was carried out by using AmpliTaq DNA polymerase and universal primers with 
the Big Dye Terminator cycle sequencing ready reaction kit (PE Applied Bio¬ 
systems) and analyzed on an ABI Prism 377 DNA sequencer (PE Applied 
Biosystems). 


TABLE 2. Names of SARS-CoV mRNAs and ORFs or genes 


mRNA 

ORF“ 

Name used in reference(s): 

Junction 

sequence 6 

25, 26 

13 

15 

mRNA 1 

la 

la 

la 

la 



lb 

lb 

lb 

lb 


mRNA 2 

S 

S 

S 

S 

10 

mRNA 2-1 

S' 




5/6 c 

mRNA 3 

3a 

3a 

3 

XI 

11 


3b 

3b 

4 

X2 


mRNA 3-1 

3b 




5/6 

mRNA 4 

E 

E 

E 

E 

8 

mRNA 5 

M 

M 

M 

M 

12 

mRNA 6 

6 

6 

7 

X3 

6 

mRNA 7 

7a 

7a 

8 

X4 

8 


7b 

7b 

9 



mRNA 8 

8a 

8a 

10 


11 


8b 

8b 

11 

X5 


mRNA 9 

N 

N 

N 

N 

9 


9b 

9b 

13 




a Genes that might be expressed from the mRNA. 
b Number of nucleotides in the mRNA leader-body junction sequence. 
c The junction sequence is six nucleotides but contains one mismatch. 
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Leader 


ORF la 


SF9 


SRI 1 


lb 1 

s 

“iTir 

plMlFeTTriJla Fn 1 



7b U 1 8b 1 LahJ 


-AAA 


SRI 2 SR13 SR14 SR15 SR16 SRI 8 

1 (la, lab) 

2 

2/1 (S ) 

l/i BbV b) 

4 (E) 

5 (M) 

6 ( 6 ) 

7 (7 a, 7b) 

8 (8 a, 8b) 


■C3- 


■CZ3- 


9 (N, 9b) 


FIG. 1. Schematic representation of the genomic and subgenomic organizations of SARS CoV. The genome organization is based on the 
sequence of the SARS WHU isolate. In the upper panel, the genomic structure is shown. Known and potential ORFs are indicated by open boxes 
and are not to scale. The leader region is represented by a small solid box, and the poly(A) tail is represented by AAA. ORF 8a of isolate WHU 
contains a 2-nucleotide deletion and thus gives rise to a small ORF of 24 amino acids. Positions of forward (SF8 and -9) and reverse (SR11 to -18) 
primers used for cDNA synthesis and PCR amplification of different subgenomic RNAs are indicated by arrows under the genome. The bottom 
panel illustrates the 3'-coterminal nested set of mRNAs detected in this study. The small black boxes at the 5' ends of the genomic and subgenomic 
RNAs represent the common leader sequence. The first (grey boxes) and second (open boxes) ORFs that are located at the 5'-proximal end and 
may be expressed from the mRNAs are shown. 


RESULTS 

Features of the genomic structure of SARS CoV isolate 
WHU. In the late period of the SARS outbreak, we isolated a 
SARS CoV isolate (named WHU) from a blood specimen 
from a SARS patient hospitalized in Hubei Province. The 
genome of SARS CoV WHU was completely sequenced, and 
the sequence was deposited in GenBank (accession number 
AY394850). It consisted of 29,725 nucleotides, excluding the 
poly(A) tail, and showed the typical genotypic features of the 
SARS CoV isolates that prevailed during the late epidemic 
period (29). This virus isolate was used throughout the studies 
reported here, and the sequence coordinates were based on 
the genomic sequence of WHU. The nomenclature of the 
mRNAs and genes followed the recommendations of the In¬ 
ternational Coronavirus Study Group (6), similar to those of 
Thiel et al. (26) and Snijder et al. (25). To avoid confusion due 
to different names for the same gene, a comparison with pub¬ 
lished names is shown in Table 2. 

The genomic structure of SARS CoV isolate WHU (Fig. 1) 
is similar to those of other isolates but has a deletion of two 
nucleotides which correspond to nucleotides 28 and 29 of ORF 
8a in SARS CoV Tor2 and Urbani (13, 15). The deletion was 
confirmed by sequencing multiple cDNA clones synthesized 
with viral RNAs prepared from different virus passages. This 
2-nt deletion leads to a shifted ORF 8a of only 24 amino acids, 
with its first seven codons identical to those of other isolates. 
The genomic region of ORF 8a is the hot spot for deletions 
and additions during SARS CoV evolution (4, 8). The 2-nt 
deletion apparently did not influence virus replication in in¬ 
fected cells. 

Coexistence of both plus- and minus-sense subgenomic 
RNAs in SARS CoV-infected cells. The expression of genetic 


information in members of the coronavirus family involves the 
synthesis of a variable number of subgenomic mRNAs depend¬ 
ing on particular species of coronavirus (11). Sequence analysis 
showed that the SARS CoV genome contains at least 14 ORFs, 
of which 12 are located in the 3' one-third of the genome and 
were predicted to be expressed from sgRNAs (Fig. 1). How¬ 
ever, the initial analysis could identify only five sgRNAs (15), 
which are probably not sufficient to express the 12 ORFs. By 
Northern blot analysis with a radiolabeled strand-specific 
probe which detected plus-sense RNAs (Fig. 2A), we detected 
eight subgenomic mRNAs with approximate sizes of 8.3, 4.6, 
3.7, 3.5, 2.9, 2.5, 2.0, and 1.7 kb. Similar results were obtained 
by Thiel et al. during the period of this study (26). Subse¬ 
quently, we used a strand-specific DNA probe to detect minus- 
sense strand RNAs in infected cells. As shown in Fig. 2B, 
similar patterns of minus-sense sgRNAs could be seen, indi¬ 
cating the existence of minus-sense RNAs of subgenome 
length in SARS CoV-infected cells. 

To further demonstrate that the bands detected in the 
Northern blot analysis are subgenome-length RNAs and not 
degradative or truncated products of genome-length RNAs 
and to provide conclusive evidence of the coexistence of both 
plus- and minus-sense sgRNAs, the leader-body junctions and 
surrounding regions of all of the sgRNAs detected by Northern 
blotting were amplified by RT-PCR and sequenced. 

The rationale for the cloning is that the subgenomic mRNAs 
of coronaviruses are 3' coterminal to the viral genome and 
possess a common leader sequence of about 70 nucleotides 
derived from the 5' end of the viral genome. Thus, cloning of 
the junction region of the RNA leader and body sequences 
would reveal the existence of the corresponding sgRNAs. To 
clone each possible sgRNA, we designed two common primers 
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1 
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4 (3 7)/5(3.5) 
6(2.9) 

7(2.5) 

8 ( 2 . 0 ) 

9(1.7) 


B 


FIG. 2. Northern blot analysis of SARS CoV subgenomic mRNAs 
Twenty micrograms of total cellular RNA from SARS CoV-infected 
Vero E6 cells was separated by electrophoresis through a 1.2% dena¬ 
turing agarose gel containing 2.2 M formaldehyde. The resolved RNA 
was transferred to a nylon membrane, and 32 P-labeled antisense and 
sense probes containing 305 nucleotides (positions 29421 to 29725) 
from the 3' end of the SARS CoV genome were used to detect 
subgenomic mRNAs and minus-strand subgenomic RNAs, respec¬ 
tively. The mRNA designations and their approximate sizes (in paren¬ 
theses) are indicated on the right. (A) Subgenomic mRNAs detected 
by the negative probe; (B) minus-strand subgenomic RNAs detected 
by the positive probe. 


(SF8 and SF9) complementary to the antileader sequence and 
reverse specific primers (SR11 to -16) 200 to 600 nt down¬ 
stream from the start codon of each ORF (Table 1 and Fig. 1). 
For cloning of the leader-body junction regions of plus-sense 
mRNAs, the cDNA was synthesized from total RNA of SARS 
CoV-infected cells with oligo(dT) and the strand-specific 
primer SR18. Using the common primer SP8 and one down¬ 
stream specific primer, the junction sequences of various 
sgRNAs were amplified by PCR (Fig. 3A). As shown in Fig. 
3A, all of the primer combinations amplified at least one major 
band of the expected size. It was not surprising that some 
primer combinations gave rise to multiple PCR bands, because 
the same primer combination could produce larger PCR frag¬ 
ments which correspond to the next-larger sgRNAs; e.g., 
primer SR14, intended to amplify the leader-body fusion sites 
of mRNA 7, could also amplify those of mRNAs 6 and 5 (Fig. 
3A, lane 4), and primer SR15 could amplify the junction se¬ 
quences of mRNAs 8, 7, and 6 (Fig. 3A, lane 5). It was ex¬ 
pected that primer SR16, located in ORF N, could gave rise to 
multiple bands, but only one major band corresponding to the 
junction region of mRNA 9 was observed (Fig. 3A, lane 6). 
This was probably due to the high abundance of mRNA 9 and 
its preferred binding with the primer SR16. 

The PCR fragments were individually isolated from the aga¬ 
rose gel and cloned for sequencing. At least 10 independent 
clones for each junction sequence were sequenced. The se¬ 
quence data for the leader-body junction region (Fig. 4A) 
revealed the existence of all eight sgRNAs detected in the 
Northern blots (Fig. 2). The junction sequences underlined in 
Fig. 4A are identical to the conserved core elements in the 
intergenic TRS (Fig. 4B). 

To confirm the existence of minus-sense sgRNAs, the same 
RNA preparation was used for cDNA synthesis with primer 



A B 

FIG. 3. RT-PCR analysis of SARS CoV subgenomic RNAs. One 
microgram of total cellular RNA from SARS CoV-infected Vero E6 
cells was reverse transcribed and amplified by PCR with different 
combinations of forward (SF9) and reverse (SP11 to SP16 [lanes 1 to 
6, respectively]) primers. The bands representing the specific SARS 
CoV sequences are indicated by arrowheads. The bands which re¬ 
vealed two novel subgenomic RNAs are boxed. Lanes 1, mRNA 2 
(arrowhead) and mRNA 2-1 (boxed faint band in panel A); lanes 2, 
mRNA 3 (major band) and mRNA 3-1 (boxed); lanes 3, mRNA 5 
(lower major band) and 4 (upper minor band); lanes 4, mRNA 7 
(lower band), mRNA 6 (middle band), and mRNA 5 (upper band); 
lanes 5, mRNA 8 (lower band), mRNA 7 (middle band), and mRNA 
6 (upper band); lane 6, mRNA 9. (A). The cDNA used for PCR was 
made with oligo(dT) or SR18 primer, and thus the sequence of cor¬ 
responding plus-strand RNA was amplified. (B). The cDNA used for 
PCR was made with primer SF8, which is complementary to the anti¬ 
leader sequence, and therefore the sequence of corresponding minus- 
strand RNA was amplified. 


SF8, which hybridizes only with the antileader sequences of the 
minus-strand sgRNAs. The subsequent PCR amplification and 
cloning were similar to those for plus-strand mRNAs. PCR 
fragment analysis gave rise to a band pattern (Fig. 3B) similar 
to that of plus-sense RNAs (Fig. 3A), and sequence analysis 
confirmed the identity of the individual junction sequences in 
the minus-strand RNAs. More minor bands were observed in 
some primer combinations, such as primer SR13 or 15 (Fig. 
3B, lane 3, 4, and 5), and sequencing results showed that they 
represented either larger sgRNAs or nonspecific amplifica¬ 
tions of cellular mRNAs or SARS CoV sequence. Taken to¬ 
gether, the Northern blotting and sequencing results showed 
the coexistence of both plus- and minus-strand sgRNAs. 

Identification of novel subgenomic RNAs. The Northern 
blots could readily detect eight subgenomic RNA bands, and 
the sequencing confirmed their existence. PCR amplification is 
generally more sensitive to detect sgRNAs with low abun¬ 
dances. By carefully sequencing the minor bands amplified by 
RT-PCR (Fig. 3), two novel sgRNAs which could not be re¬ 
vealed in Northern blots were found, and these were named 
2-1 and 3-1 (Fig. 1 and 4). sgRNA 2-1 (minor band in Fig. 3A, 
lane 1) was obtained with primer SR11, which amplified the S 
junction region as the major band (Fig. 3A). The leader-body 
fusion site (ACGAGC) of sgRNA 2-1 (Fig. 4A) is located 
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mRNA Junction 
1 UCUC UAAACGAAC UUUAA 

2/S UCU CUAAACGAACA UGUU 
2/1 UCUCUAA ACGAaCA UGUA 
3 UCUC UAAACGAACUUA UG 
3/1 UCUCUAA AaGAAC CCAUU 
4/E UCUCUA AACGAACUUA UG 
5/M UC UCUAAACGAACUA ACU 

6 UCUCUAA ACGAAC GCUUU 

7 UCUC U AAACGAAC AUGAA 

8 UC UCUAAACGAACA UGAA 
9/N UCUC UAAACGAACA AAUU 

A 


Intergenic TRS 

1 UCUCUAAACGAACUUUAA 
2/S CAA CUAAACGAACAUGUU 
2/1 UUGUUAU ACGAqCAUGUA 
3 CACA UAAACGAACUUAUG 
3/1 CAAAU C C AaGAACC CAU U 
4/E AGUGAGU ACGAACUUAUG 
5/M GG UCUAAACGAACUAACU 
6 CUACAUC ACGAACGCUUU 
CCAUA AAACGAACAUGAA 
AG UCUAAACGAACAUGAA 
9/N UAA AUAAACGAACAAAUU 

B 


FIG. 4. Leader-body fusion sites of subgenomic mRNAs and their corresponding intergenic sequences. The 5' genomic leader TRS is in italic. 
The hexanucleotide core sequence of the TRS is indicated in boldface, and the mismatched nucleotides with the leader core sequence (ACGAAC) 
are in lowercase. (A) Leader-body junction sites of subgenomic mRNAs in comparison with the genomic leader TRS. The junction sequences in 
subgenomic RNAs are underlined. (B) The TRS in the intergenic regions. The body sequences that are fused with the 5' leader are underlined. 


inside the S gene, 384 nucleotides downstream from the au¬ 
thentic core sequence (ACGAAC) for mRNA 2/S. In sgRNA 
2-1, the first AUG codon is followed immediately by a stop 
codon, UAA, and the second AUG is 43 nt downstream and in 
the same reading frame of the S gene, which could result in the 
synthesis of a truncated S protein (named S') missing the 
N-terminal 143 amino acids. The corresponding ORF is named 
ORF 2b. By fusion of the green fluorescent protein gene with 
the 5' part of sgRNA 2-1, a fusion protein could be detected, 
indicating the translatability of this sgRNA (data not shown). 
However, the existence of protein S' in infected cells is yet to 
be determined. 

The second novel sgRNA (3-1) (minor band in Fig. 3A, lane 
2, and B, lane 2) corresponded to ORF 3b, which had been 
predicted to be expressed from mRNA 3 (25, 26). The leader- 
body fusion site (AAGAAC) for subgenomic mRNA 3-1 is 10 
nucleotides upstream of the AUG start codon of ORF 3b and 
has a mismatch (underlined) with the leader core sequence 
(CS-L) (ACGAAC) of SARS CoV. Therefore, the existence of 
sgRNA 3-1 may indicate that ORF 3b could be expressed from 
a separate mRNA other than mRNA 3. The expression of 3b 
from sgRNA 3-1 was subsequently verified by fusion with the 
green fluorescent protein gene (data not shown). 

The leader-body fusion sites of both sgRNA 2-1 (ACGAGC) 
and sgRNA 3-1 (AAGAAC) (Fig. 4A) have one nucleotide dif¬ 
ference (underlined) from the core sequence (ACGAAC) in the 
leader TRS (TRS-L) of SARS CoV (26) but are identical to the 
core sequence of TRS-B (Fig. 4), which is consistent with previous 
findings that the core sequence in subgenomic mRNAs is derived 
from the body TRS but not from the leader TRS (20, 27, 30). 


After sequencing of 12 independent clones of the junction 
region of mRNA 3-1, one clone showed a variant fusion site 
within or upstream of the AAA sequence motif (Fig. 5B), 
which is three nucleotides preceding the body core sequence 
(CS-B) (AAGAAC) for mRNA 3-1. This variant sequence was 
confirmed by sequencing another set of independent clones 
and may not represent a random event of template switching 
around the CS-B (AAGAAC) during the discontinuous tran¬ 
scription. 

Uniqueness and stability of the junction sequences of the 
subgenomic RNAs. Comparison of the leader-body junction 
sequences of different sgRNAs showed that each subgenomic 
mRNA has a unique fusion site that is different from all others 
(Fig. 4A). The length of complementary sequences between 
TRS-L and TRS-B varies from 6 nucleotides in mRNA 6 to 12 
nucleotides in mRNA 5/M, although they all contain the con¬ 
served hexanucleotide sequence (5'-ACGAAC-3'), except for 
the sequences of mRNAs 2-1 and 3-1, which have one nucle¬ 
otide mismatch. The junction sequences of mRNAs 4 and 7 as 
well as mRNAs 3 and 8 have the same length but carry differ¬ 
ent extra nucleotides flanking the hexanucleotide core se¬ 
quence (Fig. 4A). The uniqueness of the fusion sites of SARS 
CoV could play a regulatory role in controlling the abundances 
of different mRNAs (30). 

We also analyzed viral RNAs prepared from viruses at dif¬ 
ferent passages for 4 months. Sequencing of multiple cDNA 
clones of each mRNA showed that the fusion sites were stable 
for all mRNAs and did not change over time with virus pas¬ 
sage. 
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FIG. 5. Junction sequences of SARS CoV mRNA 3-1 and models for template switch. The upper strand in the alignments represents the 
intergenic region of mRNA 3-1, and the lower strand is the genomic leader sequence. Dots indicate identity between the sequences. The conserved 
hexanucleotide core sequence is shaded, and the possible site for the template switch is indicated by arrow. The nucleotides in color are derived 
directly from the sequence profiles. (A) Leader-body fusion site of mRNA 3-1; (B and C) junction sequence and models of template switch of a 
rare variant of mRNA 3-1. 


DISCUSSION 

Coronaviruses possess the largest RNA genomes, and the 
genes located on the 3' parts of the genomes are expressed 
through a nested set of subgenomic mRNAs that are both 3' 
and 5' coterminal to the viral genome. There are about 12 
ORFs in the 3' one-third of the SARS CoV genome, and in 
this report we have characterized 10 subgenomic mRNAs, 2 of 
which have not been reported previously. 

The synthesis of subgenomic mRNAs of coronaviruses in¬ 
volves a discontinuous step in which the 5' leader and 3' body 
sequences of mRNA are joined through the transcription-reg¬ 
ulating sequences in the 3' end of the leader and in the inter¬ 
genic region preceding each mRNA body (30). Although the 
molecular details of the discontinuous RNA transcription are 
not completely known, the discovery of transcriptionally active, 
subgenomic-size minus strands containing antileader sequence 
(23, 24) favors the model of discontinuous transcription during 
the minus-strand synthesis (19). In this report, we have shown 
the coexistence of both plus- and minus-strand subgenome- 
length RNAs in SARS CoV-infected cells, consistent with pre¬ 
vious findings for other coronaviruses (1, 2). Therefore, the 
present data are more compatible with the discontinuous mi¬ 
nus-strand synthesis model. 

The proposed coronavirus discontinuous transcription mech¬ 
anism implies a close interaction between leader TRS (TRS-L) 
and complementary body TRS (cTRS-B) in the intergenic re¬ 
gion (30). The eight sgRNAs (mRNAs 2 to 9) of SARS CoV, 


which are easily detected by Northern blotting, possess junc¬ 
tion sequences of 6 to 12 nucleotides, all containing the ca¬ 
nonical core sequence (5'-ACGAAC-3'). The 100% identity of 
leader and body core sequences for these eight sgRNAs made 
it impossible to judge the origin of the junction sequences 
(from TRS-L or TRS-B) and the template switch site within 
the TRS. However, identification of two novel SARS CoV 
sgRNAs with noncanonical fusion sites shed light on these 
questions. The CS-Bs of both mRNA 2-1 (ACGAGC) and 
mRNA 3-1 (AAGAAC) (Fig. 4) contain one-nucleotide mis¬ 
matches (underlined) with the CS-L (ACGAAC), but the se¬ 
quence patterns of CS-B were retained in the junction region, 
indicating that the junction sequences of coronavirus sgRNAs 
originate from the CS-B, and this, in turn, supports the dis¬ 
continuous minus-strand synthesis model. While the mismatch 
in mRNA 3-1 is at the second position of the hexanucleotides, 
the template switch can be envisaged to take place at the 3' end 
of the nascent minus-strand RNA (Fig. 5A), again reinforcing 
the model of discontinuous transcription at minus-strand syn¬ 
thesis (19). 

In this study, a rare species of mRNA 3-1 which contains a 
junction of only three nucleotides (AAA) was discovered (Fig. 
5B and C). Although transcription of this subgenomic RNA 
could represent a rare event for SARS CoV, it did give more 
evidence for the use of noncanonical transcriptional signals in 
synthesis of sgRNAs. The template switching takes place at the 
sequence motif AAA, just preceding the leader core sequence 
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(CS-L) and 3 nt upstream from the complementary body core 
sequence of mRNA 3-1. Two models could be proposed for the 
synthesis of this rare RNA: (i) the AAA is used directly as a 
transcription-regulating signal, and the complementarity be¬ 
tween CS-L and CS-B takes place in the AAA region during 
the template switch step (Fig. 5B), and (ii) the interaction and 
complementarity between CS-L and CS-B are the same as 
those of mRNA 3-1, but the RNA polymerase can slide 4 
nucleotides back on the leader template (Fig. 5C). In another 
coronavirus, mouse hepatitis virus, the UUA sequence was 
characterized as a noncanonical site for subgenomic RNA syn¬ 
thesis (28), implying that the AAA sequence in a specific se¬ 
quence context in SARS CoV might also suffice for sub¬ 
genomic synthesis and thus supporting the former model. 

Although we have shown that the mRNAs 2-1 and 3-1 iden¬ 
tified in this study could be functional messages, we have not 
identified their natural expression product in SARS CoV-in- 
fected cells due to the strict control on, and later prohibition 
against, using living SARS virus. According to the sequence of 
mRNA 2-1, it can lead to translation of a truncated S protein 
(S'). A similar truncated S protein has been reported for por¬ 
cine respiratory coronavirus (3, 18). Currently, we are making 
efforts to construct an infectious cDNA clone of SARS CoV, 
and the use of reverse genetics will be helpful to elucidate the 
molecular mechanism of the discontinuous transcription and 
to reveal the biological functions of the new sgRNAs and their 
encoded proteins in the viral life cycle and pathogenesis. 
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